Learning in Hybrid Noise Environments Using Statistical Queries

نویسنده

  • Scott E. Decatur
چکیده

We consider formal models of learning from noisy data. Speciically, we focus on learning in the probability approximately correct model as deened by Valiant. Two of the most widely studied models of noise in this setting have been classiication noise and malicious errors. However, a more realistic model combining the two types of noise has not been formalized. We deene a learning environment based on a natural combination of these two noise models. We rst show that hypothesis testing is possible in this model. We next describe a simple technique for learning in this model, and then describe a more powerful technique based on statistical query learning. We show that the noise tolerance of this improved technique is roughly optimal with respect to the desired learning accuracy and that it provides a smooth tradeoo between the tolerable amounts of the two types of noise. Finally, we show that statistical query simulation yields learning algorithms for other combinations of noise models, thus demonstrating that statistical query speciication truly captures the generic fault tolerance of a learning algorithm. An important goal of research in machine learning is to determine which tasks can be automated, and for those which can, to determine their information and computation requirements. One way to answer these questions is through the development and investigation of formal models of machine learning which capture the task of learning under plausible assumptions. In this work, we consider the formal model of learning from examples called \probably approximately correct" (PAC) learning as deened by Valiant Val84]. In this setting, a learner attempts to approximate an unknown target concept simply by viewing positive and negative examples of the concept. An adversary chooses, from some speciied function class, a hidden f0; 1g-valued target function deened over some speciied domain of examples and chooses a probability distribution over this domain. The goal of the learner is to output in both polynomial time and with high probability, an hypothesis which is \close" to the target function with respect to the distribution of examples. The learner gains information about the target function and distribution by interacting with an example oracle. At each request by the learner, this oracle draws an example randomly according to the hidden distribution, labels it according to the hidden target function, and returns the labelled example to the learner. A class of functions F is said to be PAC learnable if 1996 Springer-Verlag.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relational Databases Query Optimization using Hybrid Evolutionary Algorithm

Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...

متن کامل

The effects of traffic noise on memory and auditory-verbal learning in Persian language children

Background: Acoustic noise is one of the universal pollutants of modern society. Although the high level of noise adverse effects on human hearing has been known for many years, non-auditory effects of noise such as effects on cognition, learning, memory and reading, especially on children, have been less considered. Factors which have negative impact on these features can also have a negat...

متن کامل

Statistical Query Learning (1993; Kearns)

The problem deals with learning {−1, +1}-valued functions from random labeled examples in the presence of random noise in the labels. In the random classification noise model of of Angluin and Laird [1] the label of each example given to the learning algorithm is flipped randomly and independently with some fixed probability η called the noise rate. The model is the extension of Valiant’s PAC m...

متن کامل

Learning with Queries Corrupted by Classification Noise

Kearns introduced the “statistical query” (SQ) model as a general method for producing learning algorithms which are robust against classification noise. We extend this approach in several ways in order to tackle algorithms that use “membership queries”, focusing on the more stringent model of “persistent noise”. The main ingredients in the general analysis are: (1) Smallness of dimension of th...

متن کامل

On Using Extended Statistical Queries to Avoid Membership Queries

The Kushilevitz-Mansour (KM) algorithm is an algorithm that finds all the “large” Fourier coefficients of a Boolean function. It is the main tool for learning decision trees and DNF expressions in the PAC model with respect to the uniform distribution. The algorithm requires access to the membership query (MQ) oracle. The access is often unavailable in learning applications and thus the KM algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995